最大化单调性函数是机器学习,经济学和统计数据中的一项基本任务。在本文中,我们提出了单调连续DR-submodular最大化问题的两种通信效率分散的在线算法,这两者都减少了函数梯度评估的数量,并从$ t^{3/2}中降低了每轮的通信复杂性$至$ 1 $。第一个,单发的分散式元弗兰克 - 沃尔夫(Mono-dmfw),达到了$(1-1/e)$ - 遗憾的是$ o(t^{4/5})$。据我们所知,这是单调连续DR-submodular Maximization的第一个单发和无投射分散的在线算法。接下来,受到非界化的增强功能\ citep {zhang2022boosting}的启发,我们提出了分散的在线增强梯度上升(dobga)算法,该算法获得了$(1-1/e)$ - 遗憾的是$(\ sqrt {\ sqrt { t})$。据我们所知,这是获得$(1-1/e)$的最佳$ o(\ sqrt {t})$的第一个结果步。最后,各种实验结果证实了所提出的方法的有效性。
translated by 谷歌翻译
在本文中,我们在下闭合的凸套装上重新审视了在线非单调的DR-Submodular Mavimivel问题,该凸套装在机器学习,经济学和操作研究的领域中找到了广泛的现实世界应用。首先,我们以$ o(\ sqrt {t})$的价格呈现元MFW算法,价格为$ t^{3/2} $每回合。据我们所知,Meta-MFW是第一个获得$ 1/e $ - regret $ o(\ sqrt {t})$的算法放。此外,与ODC算法\ citep {thang2021online}形成鲜明对比的是,meta-mfw依赖于简单的在线线性甲骨文而无需离散化,提升或舍入操作。考虑到实用限制,我们然后提出了单声道-MFW算法,该算法将每个功能的随机梯度评估从$ t^{3/2} $减少到1,并实现$ 1/e $ -e $ -e-regret BOND $ O(t ^{4/5})$。接下来,我们将Mono-MFW扩展到Bandit设置,并提出Bandit-MFW算法,该算法获得了$ 1/e $ - regret键的$ O(t^{8/9})$。据我们所知,Mono-MFW和Bandit-MFW是第一个探索在线非占用dr dr-submodumarmimization thy pownlosed convex set的sumblinear-regret算法,可以探索单发和强盗设置。最后,我们对合成数据集和现实数据集进行了数值实验,以验证我们方法的有效性。
translated by 谷歌翻译
在本文中,我们在离线和在线设置中重新审视受约束和随机连续的子模块最大化。对于每个$ \ gamma $ -weakly dr-subsodular函数$ f $,我们使用因子显示优化方程来获得最佳辅助函数$ f $,其静止点提供$(1-e ^ { - \ gamma} )$ - 近似于全局最大值(表示为$ OPT $)的问题$ \ max _ {\ boldsymbol {x} \ in \ mathcal {c}} f(\ boldsymbol {x})$。当然,预计(镜子)渐变上升依赖于这种非忽视功能实现$(1-e ^ { - \ gamma} - \ epsilon ^ {2})Opt- \ epsilon $ o在$ o(1 / \ epsilon ^ {2})$迭代,击败传统$(\ frac {\ gamma ^ {2}} {1+ \ gamma ^ {2}})$ - 近似渐变上升\ citep {hassani2017gradientient},用于子模块的最大化。同样,基于$ F $,配备veriance减少技术的经典弗兰克 - 沃尔夫算法\ citep {mokhtari2018conditional}也返回一个大于$大于$(1-e ^ { - \ gamma} - \ epsilon ^ {2的解决方案})OPT- \ epsilon $ o $ o(1 / \ epsilon ^ {3})$迭代。在在线设置中,我们首先考虑随机梯度反馈的对抗延迟,我们提出了一种促进了具有相同非忽视搜索的在线梯度算法,实现了$ \ sqrt {d} $的遗憾(其中$ d $ where梯度反馈延迟的总和(1-e ^ { - \ gamma})$ - 近似到后智中最佳可行解决方案。最后,广泛的数值实验表明了我们提升方法的效率。
translated by 谷歌翻译
由于许多有趣的现实世界应用在物流和在线广告中,我们考虑一个在线分配问题,受降低资源和上部资源限制,请求顺序到达,取样I.I.D。从未知的分发,我们需要及时判断有限的资源和下限要求。首先,了解可行性的衡量标准,即$ \ Alpha $,我们提出了一种新的算法,该算法获得1美元(\ frac {\ epsilon} {\ alpha-\ epsilon})$-offline问题的竞争率这提前了解整个请求。灵感来自先前的研究,该算法采用了一种创新的技术来动态更新阈值价格向量以进行决策。此外,提出了估计可行性最佳测量的优化方法,并在本文末尾的理论保证。基于此方法,如果我们容忍与参数$ \ eta $的略微违反下限约束,则该算法自然地扩展到设置而不具有强烈可行的假设,这涵盖了显着的无法探索的不可行情景。
translated by 谷歌翻译
In this work, we propose a Robust, Efficient, and Component-specific makeup transfer method (abbreviated as BeautyREC). A unique departure from prior methods that leverage global attention, simply concatenate features, or implicitly manipulate features in latent space, we propose a component-specific correspondence to directly transfer the makeup style of a reference image to the corresponding components (e.g., skin, lips, eyes) of a source image, making elaborate and accurate local makeup transfer. As an auxiliary, the long-range visual dependencies of Transformer are introduced for effective global makeup transfer. Instead of the commonly used cycle structure that is complex and unstable, we employ a content consistency loss coupled with a content encoder to implement efficient single-path makeup transfer. The key insights of this study are modeling component-specific correspondence for local makeup transfer, capturing long-range dependencies for global makeup transfer, and enabling efficient makeup transfer via a single-path structure. We also contribute BeautyFace, a makeup transfer dataset to supplement existing datasets. This dataset contains 3,000 faces, covering more diverse makeup styles, face poses, and races. Each face has annotated parsing map. Extensive experiments demonstrate the effectiveness of our method against state-of-the-art methods. Besides, our method is appealing as it is with only 1M parameters, outperforming the state-of-the-art methods (BeautyGAN: 8.43M, PSGAN: 12.62M, SCGAN: 15.30M, CPM: 9.24M, SSAT: 10.48M).
translated by 谷歌翻译
应用于物理工程系统的纯粹数据驱动的深神经网络(DNN)可以推断出违反物理定律的关系,从而导致意外后果。为了应对这一挑战,我们提出了一个基于物理模型的DNN框架,即Phy-Taylor,该框架以物理知识加速了学习合规的表示。 Phy-Taylor框架做出了两个关键的贡献。它引入了一个新的建筑物理兼容神经网络(PHN),并具有新颖的合规机制,我们称{\ em物理学引导的神经网络编辑\/}。 PHN的目的是直接捕获受物质量的启发的非线性,例如动能,势能,电力和空气动力阻力。为此,PHN增强了具有两个关键组成部分的神经网络层:(i)泰勒级数序列扩展的非线性功能捕获物理知识的扩展,以及(ii)缓解噪声影响的抑制器。神经网络编辑机制进一步修改了网络链接和激活功能与物理知识一致。作为扩展,我们还提出了一个自我校正的Phy-Taylor框架,该框架介绍了两个其他功能:(i)基于物理模型的安全关系学习,以及(ii)在违反安全性的情况下自动输出校正。通过实验,我们表明(通过直接表达难以学习的非线性并通过限制依赖性)Phy-Taylor的特征较少的参数和明显加速的训练过程,同时提供增强的模型稳健性和准确性。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译